Ultrahigh dimensional time course feature selection.

نویسندگان

  • Peirong Xu
  • Lixing Zhu
  • Yi Li
چکیده

Statistical challenges arise from modern biomedical studies that produce time course genomic data with ultrahigh dimensions. In a renal cancer study that motivated this paper, the pharmacokinetic measures of a tumor suppressor (CCI-779) and expression levels of 12,625 genes were measured for each of 33 patients at 8 and 16 weeks after the start of treatments, with the goal of identifying predictive gene transcripts and the interactions with time in peripheral blood mononuclear cells for pharmacokinetics over the time course. The resulting data set defies analysis even with regularized regression. Although some remedies have been proposed for both linear and generalized linear models, there are virtually no solutions in the time course setting. As such, a novel GEE-based screening procedure is proposed, which only pertains to the specifications of the first two marginal moments and a working correlation structure. Different from existing methods that either fit separate marginal models or compute pairwise correlation measures, the new procedure merely involves making a single evaluation of estimating functions and thus is extremely computationally efficient. The new method is robust against the mis-specification of correlation structures and enjoys theoretical readiness, which is further verified via Monte Carlo simulations. The procedure is applied to analyze the aforementioned renal cancer study and identify gene transcripts and possible time-interactions that are relevant to CCI-779 metabolism in peripheral blood.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A selective overview of feature screening for ultrahigh-dimensional data.

High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of high-dimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that ...

متن کامل

Ultrahigh Dimensional Feature Screening via RKHS Embeddings

Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dimensional regime. But in the ultrahigh dimensional regime, these approaches suffer from several pro...

متن کامل

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates.

This paper is concerned with feature screening and variable selection for varying coefficient models with ultrahigh dimensional covariates. We propose a new feature screening procedure for these models based on conditional correlation coefficient. We systematically study the theoretical properties of the proposed procedure, and establish their sure screening property and the ranking consistency...

متن کامل

Towards Large-scale and Ultrahigh Dimensional Feature Selection Towards Large-scale and Ultrahigh Dimensional Feature Selection via Feature Generation

In many real-world applications such as text mining, it is desirable to select the most relevant features or variables to improve the generalization ability, or to provide a better interpretation of the prediction models. In this paper, a novel adaptive feature scaling (AFS) scheme is proposed by introducing a feature scaling vector d ∈ [0, 1] to alleviate the bias problem brought by the scalin...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biometrics

دوره 70 2  شماره 

صفحات  -

تاریخ انتشار 2014